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Abstract: Wc present a sequential Monte Carlo sampler algorithm for 
the Bayesian analysis of generalised linear mixed models (GLMMs). These 
models support a variety of interesting regression-type analyses, but per- 
forming inference is often extremely difficult, even when using the Bayesian 
approach combined with Markov chain Monte Carlo (MCMC). The Sequen- 
tial Monte Carlo sampler (SMC) is a new and general method for producing 
samples from posterior distributions. In this article we demonstrate use of 
the SMC method for performing inference for GLMMs. We demonstrate 
the effectiveness of the method on both simulated and real data, and find 
that sequential Monte Carlo is a competitive alternative to the available 
MCMC techniques. 
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1. Introduction 

Effective strategies for generalised linear mixed model (GLMM) analysis con- 
tinues to be a vibrant research area. Reasons include: 

• GLMMs have become an indispensable vehicle for analysing a significant 
portion of contemporary complex data sets. 

• GLMMs are inherently difficult to fit compared with ordinary linear mixed 
models and generalised linear models. 
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• Existing strategies involve a number of trade-offs concerning, for example, 
approximation accuracy, computational times and Markov chain conver- 
gence. 

Overviews of the usefulness and difficulties of GLMM-based analysis may be 
found in, for example, [23, 27] and [28]. 

Most practical GLMM methodology falls into two categories: analytic approx- 
imations (e.g. [2]) and Monte Carlo methods (e.g. [(>]). Monte Carlo methods 
have the advantage of providing direct approximations to quantities of interest 
[1]. On the other hand, analytic approximations, such as Laplace approximation, 
are indirect and prone to substantial bias (e.g. [3]). The most common Monte 
Carlo approach is Markov Chain Monte Carlo (MCMC), where approximation 
accuracy is associated with Markov chain convergence. 

[30] is a recent example of research concerned with practical GLMM analy- 
sis via Markov chain Monte Carlo. Those authors explored use of the MCMC 
computing package WinBUGS and showed it to exhibit good performance for a 
number of examples. 

One of the major difficulties associated with using MCMC is the need to as- 
sess convergence. Popular methods for convergence assessment often rely on the 
comparison of multiple sample output; see [7] for a comparative review. These 
methods can invariably fail to detect a lack of convergence and one needs to be 
cautious when taking such an approach. Another major drawback of MCMC is 
the difficulty in designing efficient samplers for complex problems. The use of 
historical information from MCMC sample paths has to be treated very care- 
fully, so that the equilibrium distribution of the Markov chain is not disturbed. 
Various methods have been proposed in the literature, (see [14]), however the 
practical applicability of these so-called adaptive methods can be limited. 

Both problems associated with MCMC discussed above are inherently due to 
the Markovian nature of the MCMC sampler. Sequential Monte Carlo (SMC) 
methods provide an alternative framework for posterior sampling, which is not 
dependent on the convergence of a Markov chain as in the MCMC sampler case. 
Though careful assessment of posterior samples is also applicable in the SMC 
case, these are more in line with the standard Monte Carlo methods. SMC sam- 
plers can be seen as an extension of the well known importance sampling method. 
The fact that SMC methods do not rely on Markov chain theory means that it 
is a more flexible sampler. In the sense that for example, if the historical sample 
path of the SMC sampler is informative for the design of an efficient algorithm, 
this can be done quite easily in the SMC framework. In this article we show that 
sequential Monte Carlo methods provide an effective means of Bayesian GLMM 
analysis. We provide a general yet simple framework for efficient design of the 
sampler, and demonstrate that this approach is a viable alternative to MCMC, 
and since SMC samplers require a number of user-specified inputs, we will give 
recommendations in the GLMM framework on how these are chosen. 

Section 2 contains a brief summary of Bayesian approaches to generalised 
linear mixed models. In Section 3 we provide details on analysis for such models 
via sequential Monte Carlo sampling. In Section 4 we present two examples. In 
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a simulated Poisson regression example, we compare the efficiencies of the SMC 
sampler with alternative Monte Carlo methods, and then demonstrate the ef- 
fectiveness of the SMC sampler in a binary logistic regression example involving 
real data. Some comparisons of algorithm efficiencies for the two examples are 
carried out in Section 5 and concluding remarks are given in Section 6. The 
software used for this paper is available from the authors on request. 



2. Bayesian generalised linear mixed models 

GLMMs for canonical one-parameter exponential families (e.g. Poisson, logistic) 
and Gaussian random effects take the general form 

[y|/3, u, G] = cxp{y^(X^ + Zu) - 1^6(X/3 + Zu) + l^c(y)}, (1) 

[u|G]^iV(0,G) (2) 

where here, and throughout, the distribution of a random vector x is denoted 
by [x] and the conditional distribution of y given x is denoted by [y|x]. In 
the Poisson case b{x) — e^, while in the logistic case b{x) = log(l + e^'). An 
important special case of (1)~(2) is the variance components model 

[y|/?,u,a2i,...,a2^] - exp{y^(X;9 -f Zu) - 1^&(X;3 + Zu) + l^c(y)}. 



u 
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Ul 



Ul 



iV(0,blockdiagi 
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(3) 

where qg is the number of elements in Uf. While (3) is not as general as (1)- 
(2) it still handles many important situations such as random intercepts and 
generalised additive models [30]. With simplicity in mind, we will focus on this 
GLMM for the remainder of the paper. However, the methodology of Section 3 
is quite general and is extendable to more elaborate GLMMs. 

In this study we have worked with diffuse conjugate priors although, once 
again, the methodology extends to other types of priors. To ensure scale-invariance 
all continuous predictors are standardised at the start of the Bayesian analysis. 
The prior on /3 is a diffuse Gaussian: 



(4) 



for some large cr| > 0. The prior for (cr^i, . . . , cr^^) is assumed to have indepen- 
dent components; i.e. 



A number of possibilities for [cr^^] could be considered [IS]. These include an 
inverse gamma distribution, a uniform distribution, and a folded Cauchy distri- 
bution. In this paper we use a conditionally conjugate inverse gamma distribu- 
tion: 
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(5) 
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This prior distribution was advocated by [30] for A^i — 0.01. The prior is there- 
fore fairly non-informative, yet results in a slightly simpler sampling procedure. 

It will be convenient to introduce some additional notation to enable the 
model to be described more succinctly. We start by writing 



C = [X Zl and 1/ = 



13 
u 



We also write qp for the number of elements in /3, and 

V = blockdiag(cr^I,^ ,al^Iq^,..., al^lq^ ) 

for the prior covariancc of v. Writing cr^ for (cr^j^, . . . , c^^), we can then combine 
(3), (4) and (5) to give the joint density of all parameters and data: 



[y, V, a^] = exp y^Cv - l^h{Cv) + l^c(y) - \v'^Y-'v - ^ f \og{al,) 

L 

+ {Aui \og{Aui) - logr(A„^) - [Aui + 1) log(cr2^) - Aui/crli,] 



From this, and noting that v^Y^v = ||/3|p/o-^ + iP/cr^^, it is clear 

that the density of the posterior distribution of the parameters is simply pro- 
portional to the function 



7r(i/, a^) = exp 



1=1 



(6) 



In Section 3, we will develop a sequential Monte Carlo sampler to produce 
samples from the distribution proportional to tt. 

3. Sequential Monte Carlo sampling 

The Monte Carlo approach to GLMM analysis performs inference by drawing 
samples from the joint posterior distribution of the parameters 6 = (/?, u, cr^j^, 
■ ■ • '"'ml)- We write 7r(0) for the (unnormalised) density of this posterior distri- 
bution. Instead of using a Markov chain with tt as its stationary distribution to 
produce these samples, the sequential Monte Carlo (SMC) sampling method is 
a generalisation of importance sampling that produces a weighted sample from 
TT while retaining some of the benefits of MCMC analysis [8] . 

The use of SMC for static problems (as opposed to particle filters for dynamic 
problems; [12] requires the introduction of auxiliary distributions ttq, tti,..., 
TTs-i- At stage s of the sampler we use a (weighted) sample from the previous 
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distribution tt^^i to produce a (weighted) sample from tTs- We set 775 — tt so 
that after S stages we have a sample from the posterior distribution of interest. 
The introduction of the intermediate distributions allows the initial distribution 
ttq to be gradually corrected to resemble the target distribution tt, and can 
often overcome problems such as particle depletion where, if the two consecutive 
distributions are too dissimilar, then a small number of particles carry all the 
weight in the final sample. 

The auxiliary distributions can be constructed in several ways: [5] introduces 
the observations incrementally to evolve the distribution from the prior to the 
posterior; [I'.i] uses a similar technique, but increases the size of the state space 
as more observations are added; [n] use 

TTs oc tTq ^° tt'^" , where (7) 
= 70 < 71 < • • • < 7s = 1 

and TTq is chosen to be the prior distribution for the parameters. In this arti- 
cle, due to the diffuse nature of the prior distribution, the initial distribution 
TTq is instead chosen to be a multivariate Normal distribution with mean and 
covariance matrix chosen based on estimates obtained using classical methods 
for fitting GLMMs. 

The SMC sampler algorithm starts by sampling N samples, termed "parti- 
cles", from the initial distribution ttq. Denote by 0^ the ith particle at initial 
stage s = 0, and allocate weight = 1 to each of the N particles, so that 
{6'^,w^} is a weighted sample from ttq. 

The SMC sampling technique uses the weighted particles from distribution 
TTs-i to produce particles from distribution tTs through moving, reweighting and 
(possibly) resampling; see [8]. For simplicity, the formulation we use is that 
described in detail in Section 3.3.2.3 of that paper, which essentially results in 
the resample-move algorithm used by [5] and [19]. This is also similar to the 
annealed importance sampling method of [24] , but the use of resampling within 
the algorithm greatly improves the efficiency of the method. Writing 0^ for the 
ith particle at stage s, at each stage < s < S" of the algorithm we perform the 
following steps: 

Reweight Given N weighted particles {O^^^ ,w1~^} from tTs-i, set 

{Ol~^ , wf} is now a weighted sample from tTj. 
Resample If the effective sample size (ESS, [20]), defined as (Eti O V 
is less than kN , where k is some constant typically taken to be 
1/2, then we perform stratified resampling [21]. ESS estimates the number 
of simple random samples from the target distribution that are required 
to obtain an estimate with the same Monte Carlo variation as the esti- 
mate using the N weighted particles. Resampling refers to a suite of tech- 
niques that replicate the particles in such a way that the expected value 
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of particlc-bascd estimators is retained, but particles with low weights are 
discarded and particles with high weights multiplied; see [11] for a sum- 
mary and comparison of several such approaches. This standard practise 
in the sequential Monte Carlo literature allows further computational ef- 
fort to focus on samples that are likely to contribute non-negligibly to the 
final estimate. Finally, resampled particle weights are reset to {wf} = 1. 
Move Let {0s,wf},i = 1, ... ,7V denote samples from at the current distribu- 
tion TTs after reweighting and (possibly) resampling. To increase particle 
diversity we replace each sample according to 

where Kg is an MCMC transition kernel that admits tt^ as stationary 
distribution. [1"] provides detail on MCMC transition kernels. 

It is known [s] that this particular formulation of the SMC sampling is sub- 
optimal, in terms of the variance of the importance weights {wf}, especially 
if the distributions on consecutive stages are too far apart. However, since the 
optimal formulation is intractable, and for the static problem we have here it 
is easy to ensure that the difference between tTs-i and tt^ is small, we use this 
simpler formulation. (Contrast this situation with that of an SMC algorithm 
for a dynamic problem, or the technique of ['>] where data arrive over time and 
there is no control over the distance between tt^-i and tts.) 

The "parameters" of the algorithm that must be chosen when implementing 
this sampler arc therefore: 

• the initial distribution ttq, 

• the sequence of values 7^ that govern the rate of transition from the initial 
distribution ttq to the posterior distribution tt, 

• the transition kernels K^, used to move the particles within the distribu- 
tion proportional to tt^, and 

• the number of particles N. 

• the total number of distributions S. 

Specific choices of these parameters used in this paper are discussed in the 
following subsections. We give a more algorithmic description of our method in 
the Appendix. 

3.1. Initial distribution tvq 

As previously observed, using the prior distribution as an initial distribution is 
flawed in this case, since the prior is highly diffuse. Instead we use the penalised 
quasi-likclihood (PQL) method [2] to obtain an approximate fit of the model. 
Let i/pQL and 5pQL be the estimate of u and obtained using PQL. We will 
calculate a normal approximation of the posterior distribution of u centred 
at this approximate maximum likelihood estimate, which can then be used to 
construct an initial distribution ttq for the SMC sampling procedure. Note from 
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(6) that 



TT{iy, a^) = exp {y^Ci/ - 1^6(Ci/) - ^i/^V-ii/ + f{a^)} 



where / is sonic function that docs not depend on v. It is a simple calculation 
to see that the matrix of second derivatives with respect to components of i/ is 
— C"^diag{6"(Ci/)}C — V^^; wc therefore initialise our algorithm by taking a 
normal distribution for u with mean Ppql and covariance matrix 



E = 



C^diag{6"(CPpQ0}C + V; 



(8) 



where the entries in Vpql are taken from ffpQL- 

It remains to specify a distribution for the variance vector . We have found 
it convenient to specify this conditional on u, and of a form that is consistent 
with the posterior distribution tt. We take 



7ro(a^ 



n 

e=i 



(A 



|u.f) 



TiA^i + qe/2) 



(9) 

i.e. the ct^^ are conditionally independent given i/, and each has an inverse 
gamma distribution depending on the corresponding components of u. Hence, an 
initial sample from ttq can easily be generated by first sampling from the normal 
distribution for v then sampling the cr^^ from their conditional distributions. 
Furthermore, we will see in Sections 3.2 and 3.3 that this results in simple 
conditional distributions for cr^^ at all stages of the sampler. 

Putting together the initial distributions of i/ and cr^, we see that 



TToiv, cr^) oc exp 



2 



(10) 



In the GLMM examples of this paper, there is no reason to suspect that 
the posterior distribution is particularly spread out or multi-modal. Hence this 
TTo is sufHcient, as demonstrated by the fact that several different Monte Carlo 
methods provide identical inference in the examples of Section 4. In other ex- 
amples where these complications are likely to occur, ttq should be chosen with 
an inflated variance, or to be a t distribution, to help dominate the posterior 
distribution. Note that multi-modality is less of a problem for the sequential 
Monte Carlo approach than it would be for MCMC, since there is no difficulty 
in having samples in both modes simultaneously, whereas an MCMC approach 
must move between the nodes through areas of low posterior probability. 
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3.2. Sequence of intermediary distributions 

In this section we describe the sequence of distributions used to transition from 
ttq to TVs = TT. Recall that we choose to use the formulation (7). Using (10) and 
(6) it is clear that the intermediate distributions are proportional to tts where 



7rs(i/,a^) 



exp 



{y^C l^b{Cu) ^||/3f + + |) log(A„, + i||u,f ) 



- J2 {i^ui + f + 1) \og{al,) + (A„, + ||u,f } 



+ (1-7s){-2(''-'^pq0 S '(i/-i>pql) 



exp 



(=1 

,{y^Cu-l^biCu) -L\\p\\^}+J2iAue + f ) log(A„, + i||u,f ) 



+ (1 - 7.){ - 2(1^ - u,^^.f^-\u - u,^,)} 

L 

- {(A,, + f + 1) logcT^, + [A^, + i||u,f )/a2,} 



(11) 



In the absence of any additional information about the shapes of these distri- 
butions, it is difhcult to specify a sensible generic sequence of 7s values. Hence 
for the rest of the paper we choose to increase 7s from 70 = to 75-5 = 1 in 
a linear fashion, that is, values of 7s are sequentially incremented by the same 
amount. Additionally, we append 75-4 = • • • = 7s = 1 to this sequence to give 
five stages at the end of the sampler on which the particles are not resampled. 
This means that the final sample is well spread out over the distribution tt (it 
was found that if resampling happened too close to the end of the sampler then 
several samples might be identical, resulting in poor density estimates being 
produced using the standard techniques). 

It is an interesting and open research question as to whether the sequence 
7s can be chosen in a more principled manner. One option would be to choose 
the sequence in advance using some properties of the distributions ttq and tt. 
An alternative would be to choose the next 7s adaptively while the sampler 
proceeds through the sequence of distributions; however it is not straightforward 
to generalise the proofs of validity of the sampler in this case. 
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3.3. Transition kernels 

For this paper we choose to use Metropolis-Hastings transition kernels for the 
parameters in u. The choice of inverse gamma distributions for the components 
of within ttq means that we can simply use Gibbs sampling steps, [17], to up- 
date those components. At each step s we use a Metropolis-Hastings transition 
kernels K^- Since ttq is an approximation to tt, and tt^ is in some sense between 
TTo and TT, we use the same proposal distributions at each step s. These proposal 
distributions are derived from ttq as described in this section. 

We form a partition {Ti, . . . of {!,... ,P}, where P is the number of 
columns in C, so that [Cxi • ■ • Ci,] is the matrix C, but with columns possibly 
re-ordered; and 

" I'll 

is the corresponding partition of f. (The case J = 1 corresponds to no par- 
titioning.) On each move step of the algorithm we move through the series of 
subsets Tj, for J = 1, . . . , J. We apply a Metropolis-Hastings transition kernel 
to the components i/jj = 

To describe the transitions we introduce the matrices Ej^. , where Yij. is the 
conditional covariance under ttq of vj. given the values of f-ij = (i'i)i^Xj- 
These can be calculated at the start of the algorithm. Recall that since ttq is an 
approximation of tt, the Sx^ matrices therefore correspond to approximations 
of the conditional covariance of vi- given V-i^ under the posterior distribution 
TT. Note that we are here assuming that the approximate covariance matrices 
Tij. are close enough to the truth to be useful as proposal distributions in the 
random walk Metropolis-Hastings kernel. 

The proposal distribution for i/j^. is then a normal distribution centered on 
the current value of with covariance t^Yij.. The acceptance probability for 
the move, applied after rcweighting to get a weighted distribution from tTs, is 
simply calculated from the ratio of tTs values for the proposed and current values. 

The scaling parameters rj" are by default chosen to be 2A/^\Ij \ following the 
heuristic of [25]. However in practice they are usually chosen, based on several 
runs of the algorithm, to ensure that the acceptance rates remain close to 0.23 
(again following [2-5]). Details of specific choices used are given in the examples. 

To update the variance parameters a^, a Gibbs sampling step can be applied. 
Note from (11) that for each s the full conditional distribution of crfj^ is simply an 
inverse gamma distribution, depending on the corresponding vector of regression 
coefhcients u^. However if a different prior is used for then Gibbs sampling 
will not be available and a Metropolis-Hastings update should be performed for 
each tJ,^, in turn. 
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4. Examples 

In this section we will demonstrate the methodology on two examples. The first 
example is a semiparametric Poisson regression model, with simulated data so 
that fair comparisons can be drawn with alternative MCMC approaches. The 
second example is a binary logistic regression involving respiratory infection 
in Indonesian children, with both a semiparametric component and random 
effects. All computations were carried out in the R language [29], using a single 
core AMD Optcron 2.0GIIz. this is similar to running the program on a standard 
PC. 

4.1- Semiparametric Poisson regression 

In this section, we generate n = 500 Poisson random variables yi,i = 1, . . . , n 
from 

yi ^ Poisson(exp {0.7xii + 2x2i + cos(47ra:2i)}) 

where xu is or 1 with probability 0.5, and X2i is uniformly sampled from the 
interval [0, 1]. 

We fit model (3), with 

1 Xii X21 
1 Xi2 X22 

1 Xiyi X2n 

The radial cubic basis function is used to model the function f{x2i) = cos{A'KX2i)- 
This imphes modelling f{x2i) = (3x2X2i + 'Zij:^^u, where for knot points k^, chosen 
to be the (-^^)th quantilc of the imique predictor values, for fc = 1, . . . , if, = 
10, 



u = 



The glmmPQL method of the R statistical package gives an approximate MLE 
for the regression coefficients Ppql, and the variance parameters app^-We follow 
the general algorithm given in Section 3. There are 13 regression coefficients 
to be estimated for this model, and one variance parameter. We updated each 
of the regression coefficients v singly using random walk Metropolis-Hastings 
(RWMH) updates for the move step of the algorithm and a Gibbs update for 
the variance parameter. As with MCMC, the tuning of this kernel is crucial 
to the success of the algorithm; to achieve an acceptance rate in the MCMC 
step between 20-30% we set Tj = 1/3. We also choose the number of steps 
S = 105 and the number of particles N = 1000 based on preliminary runs. We 
will discuss the choices of A'' and S in more detail in Section 5. 



Kx) 



/3o 

/32 



Ul 



[u\al]^ N{0,all), and Z,,. = 



F2i - Kk\ 
l<fc<10 



l<fc,fc'<10 
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We compare the performance of the SMC sampler simulations by monitoring 
the QQ-plots of samples from a simple importance sampler, a single- variable slice 
sampler which updates one parameter at a time and a standard RWMH sampler 
with the same transition kernels as used in the SMC sampler (i.e. those described 
in Section 3.3). Figure 1(a) shows the QQ-plot for the /3i parameter, and the 
corresponding density estimates for f3i is given in (b). With the exception of the 
importance sampler, which can perform badly on different simulated data sets, 
the remaining samplers achieved good concordance. This required 1,000 particles 
with 105 steps for the SMC sampler. For comparison, we used 20,000 iterations of 
both slice sampler and RWMH MCMC scheme, with the first 10,000 discarded as 
burn-in. These took approximately 1394 and 2430 seconds respectively, whereas 
the SMC sampler took approximately 700 seconds. The majority of the gains in 
computational time for the SMC sampler come from the fact that the particles 
at each step can be updated simultaneously without the need to cycle through 
a loop, compared with MCMC samplers where each iteration has to be updated 
sequentially depending on the value of the parameter at the previous step. In 
the R programming language used for this research, as with many other high 
level programming languages, this provides a very significant computational 
advantage. 

The nonparametric fits of the model, calculated using the estimated posterior 
mean of u, are displayed in Figure 2. The model has successfully recovered the 
nonlinearity in the dependency on X2 and fits the data well. 

4-2. Example: Respiratory infection in Indonesian children 

Here we apply sequential Monte Carlo algorithm to an example involving res- 
piratory infection in Indonesian children (see [10, 22]). The data contain lon- 
gitudinal measurements on 275 Indonesian children, where the indicator for 
respiratory infection is the binary response. The covariates include age, height, 
indicators for vitamin A deficiency, sex, stunting and visit numbers (one to six). 

Previous analyses have shown the effect of age of the child to be non-linear, 
hence we use a logistic additive mixed model of the form 

logit{F(respiratory infectioujj = l\Ui, Uk)} — f3Q + Ui + p'^Xij + /(agCj^) 

for 1 < i < 275 children and I < j < rii repeated measures within a child. 
Ui '~ N(0,aij) is a random child effect, Xij is the measurement on a vector 
of the remaining 9 covariates, and / is modelled using penalized splines with 
spline basis coefficients Uk i.i.d. N{0,al^). 

As recommended by we use hierarchical centering of random effects. All 
continuous covariates are standardised to have zero mean and unit standard 
deviation, so that the choices of hyperparameters can be independent of scale. 
Radial cubic basis functions are used to fit the covariate age, where 



/(age) = /3,g„age Z,,,u 
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Fig 1. QQ-plots of SMC sampler output against simple importance sampler, the slice sampler 
and the RW Metropolis-Hastings sampler for f^i (a). The corresponding density estimates (b). 




Fig 2. The data, the true mean values (solid lines), and the estimated mean values (dotted 
lines) for the simulated Poisson example. The fit was based on a SMC sampler run with 1000 
particles and 105 intermediate steps, as described in the text. 



where 

Z,,„ = [|age-Kfcp][|Kfe-Kfe,|3]-i/2 and u^N{0,all) 

1<K<K l<k,k'<K 

with Kk chosen to be the (-^q^jth quantilc of the unique predictor values. We 
take iiT = 20 in this example. 

We use a vague prior A^(0, 10®) for the fixed effects. For both variance com- 
ponents, we use the conjugate Inverse Gamma prior IG(0.01, 0.01). Other prior 
choices are available, see [3U]. Here, random walk Metropolis-hastings updates 
were carried out for each regression coefficients separately, with Gibbs sampling 
used for the variance parameters. The tuning parameters Tj. for the Metropolis- 
Hastings update are again chosen to achieve an acceptance rate in the MCMC 
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coeff. 


density 


summary 


vit. A defic. 


2 -1 


1 2 


posterior mean: 0.61 
95% credible interval: 
(-0.542,1.62) 


male 


-0.5 O^'^^'^.B "^^^^^1^^^ 1.5 


posterior mean: 0.563 
95% credible interval: 
(0.0439,1.06) 


height 


-0.05 C 


0.05 0.1 o.i; 


posterior mean: 0.0338 
95% credible interval: 
(-0.0208,0.0893) 


stunted 


-1 


1 2 


posterior mean: 0.474 
95% credible interval: 
(-0.402,1.31) 


visit 2 


-3 -2 -1 




posterior mean: -1.2 
95% credible interval: 
(-2.1,-0.431) 


visit 3 




posterior mean: -0.629 
95% credible interval: 
(-1.41,0.11) 


-2 -1 1 


visit 4 


-3 -2 -1 




posterior mean: -1 .37 
95% credible interval: 
(-2.3,-0.467) 


visit 5 


-1 -0.5 


0.5 1 1.5 


posterior mean: 0.468 
95% credible interval: 
(-0.158,1.14) 


visit 6 


-1 1 


posterior mean: -0.0384 
95% credible interval: 
(-0.722,0.67) 


st.dev. (subject) 


0.5 1 1.5 


posterior mean: 0.928 
95%i credible interval: 
(0.7,1.23) 



Fig 3. Summary of coefficients in respiratory infections in Indonesian children example. 



step between 20-30%, we used Tj. = 3 for the fixed effect coefficients, Tj. = 6 
for the random effect coefficients, and Tj. = 5 for the sphne coefficients. 

Figure 3 show the results from simulation, using fOOO particles and 305 in- 
termediate steps. The Figure shows borderline positive effect of Vitamin A de- 
ficiency, sex and some visit numbers on respiratory infection. These results are 
in keeping with previous analyses. Figure 4(a) shows the nonlinear effect of 
age; 4(b) shows the effective sample size at each of 300 sequential steps of the 
simulation, vertical lines indicate the occurrence of resampling. 

Again, we compare the performance of the SMC sampler with the importance 
sampler, slice sampler and RWMH sampler with the same transition kernel as 
Step 3 of the SMC sampler. Results for 5,000 samples of the importance sam- 
pler, i,000 SMC particles and 5,000 slice samples with 5,000 burn-in and 5,000 
RWMH samples with 5,000 burn-in arc plotted in Figure 5, good agreements 
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Fig 4. Respiratory infections in Indonesian children example, (a) Posterior mean of the esti- 
mated probability of respiratory infection /(age) with all other covariates set to their average 
values, (b) Effective sample size over 300 distributions, vertical lines indicate instances of 
resampling. 



are found between the SMC, slice and MCMC samplers. The sampler which 
performed badly appears to be the importance sampler, where in this case, the 
sampler appears to suffers from particle depletion where one or few particles 
from an area of high posterior density is dominating the other particles. 

Finally, the SMC sampler took approximately the same amount of time as 
the slice sampler at 2.8 hours and the RWMH took about 9 hours (similarly 9 
hours was required in WinBUGS). In the next section wc discuss the effect of the 
sample size and step size specifications on the efficiency of the SMC sampler. 



5. Improving sampler performance 

In this section we investigate the SMC sampler performance by looking at the 
effects of user-defined specifications such as the number of sequential steps {S); 
the number of particles to sample {N) and block updating strategies. 

Here we base efficiency comparisons on the effective sample size diagnostic 
calculation of [4]. (Note this is different from the ESS [2U] used in determining 
whether resampling is performed.) This diagnostic is essentially an analysis of 
variance approach based on several parallel runs of the algorithm, which provides 
the number of independent samples from the posterior distribution that would 
be required to gain the same degree of accuracy: higher numbers are obviously 
preferrable. This estimate of effective sample size is not affected by resampling 
and in addition, can be used to compare SMC sampler with MCMC approaches 
under a consistent framework. Thus, for a given parameter, we calculate the 
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□ one block N=1 000 
m no blocking N=1 000 
■ one block N=2000 




10s 20s 50s 100s 200s RWMH slice 



Fig 6. Comparison of effective sample size for f3i from the SMC sampler (over increasing 
number of seguential distributions (S), and the slice and RWMH samplers in the Poisson 
regression examples. 



ratio of the average estimate of the posterior variance of the parameter to the 
variance of the posterior means of the parameter across independent runs. All 
our experiments were carried out using the two examples in Sections 4.1 and 
4.2, where /3i is used for the Poisson regression example, and the coefficient of 
Vitamin A deficiency is used for the logistic regression example. 

In experimenting with block/simultaneous updating the parameters for the 
move step of the SMC sampler, we found that in some cases, particularly for the 
logistic example, naive blocking updating (i.e., blocking regression coefficients, 
random effects coefficients, and spline coefficients) can in fact adversely affect 
the performance of the sampler. We also found that careful tuning of accep- 
tance probabilities in the RWMH step to be between 20-30% can be crucial to 
the performance. Finally, we found that increasing the number of sequential dis- 
tributions S, as well as increasing the number of particles N can greatly improve 
the effective sample size. 

Figure 6 shows the effective sample size for /3i calculated from a total of 
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10 independent runs of the sampler for the Poisson regression example. We 
calculated the diagnostic for the SMC sampler over S = 10, 20, 50, 100, 200. For 
each S, we implemented the sampler with a single block update using TV = 1000 
and N = 2000 particles, and a sampler without blocking with N = 1000. For 
comparison, the diagnostic for Pi was also calculated for a (single-variable) slice 
sampler and a RWMH sampler, each of 20,000 iterations with 10,000 burnin. 
The RWMH algorithm used the same transition kernels as Step 3 of the SMC 
sampler, with and without block updating. 

Clearly, the effective sample size of the sampler with no blocking is larger than 
that of the sampler with one block for all 5", and increases with larger values 
of S and N. We can see that the effective sample size of the slice sampler is 
comparable to the SMC sampler with 5 = 50 and N = 1000, while the RWMH 
sampler achieves the smallest effective sample size. 

For the logistic regression example of Section 4.2, we found that by updating 
regression parameters singly, and choosing S = 200 and N = 2000 achieved an 
effective sample size of 2604, and S = 300 and N = 1000 achieved an effective 
sample size of 2377. The two samplers took about 3.8 and 2.9 hours to run 
respectively. As a comparison, the slice sampler with 10,000 iterations with 
5,000 burn in took 2.8 hours to run and achieves an effective sample size of 
1948, while a RWMH sampler of length 10,000 with 5,000 burn in achieves an 
effective sample size of only 177, and taking 9 hours to run. 

6. Conclusion 

In this paper we presented a general sequential Monte Carlo algorithm to pro- 
duce samples from the posterior distribution for Bayesian analysis of generalised 
linear mixed models. The algorithm is an alternative to the popular Markov 
chain Monte Carlo methods. We have demonstrated that the algorithm can 
handle problems where the number of parameters to be estimated in the model 
is high. For example, in the spline formulation of the Indonesian children exam- 
ple, there were over 300 parameters in the model. In addition, the algorithm is 
generally easy to apply. We have also demonstrated that it can have substan- 
tial gains in computational time over traditional MCMC in both a simulated 
poisson example and a real data binomial example. Finally, perhaps the biggest 
advantage of SMC over MCMC samplers is the fact convergence of SMC sam- 
plers does not rely on convergence of Markov chains, which can be problematic 
in designing more efficient algorithms in complex problems. 

We have found that in the context of Bayesian GLMM analysis, the design 
of the initial sequential Monte Carlo distribution may be helped by using ap- 
proximate parameter estimates from classical GLMM analysis, such as using the 
PQL method to find MLE of the likelihood. Note that in the case where such 
estimates cannot be easily found, and the only sensible choice is a diffuse prior, 
then a SMC sampler with many more particles and sequential distributions will 
be needed to obtain good results. In choosing the schedule for the tempering 
sequence 7s in (7), we have found no substantial difference between the different 
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types of schedules currently used in the literature, hence we recommend that a 
simple linear schedule be adopted in the GLMM context. We have also found 
that by tuning the acceptance rate of the Random- Walk Metropolis-Hastings 
kernel in the Move step of the SMC sampler to around 20-30% significantly 
improves the performance of the sampler, see [!)], although this practical finding 
does not yet have rigorous theoretical support. 

Finally, in implementing the Move step of the SMC sampler, one has some 
degree of flexibility when the Markov chain Monte Carlo update is used. For 
example, one may consider a better choice of proposal distributions for the 
Metropolis-Hastings algorithm, by allowing the algorithm to automatically scale 
a proposal distribution, see for example [5]. Here a major advantage over the 
traditional MCMC is that the algorithm does not suffer from the restrictions 
associated with a Markov chain, and information from previous samples can be 
freely used to obtain future samples. Furthermore, one is not restricted to only 
MCMC type of moves in this step, other move types are possible, see [8]. 

However, sequential Monte Carlo algorithms are not black-box algorithms, 
requiring a certain amount of tuning and user input. In particular, one needs to 
set the number of sequential distributions (5') the number of particles to sample 
(N) and tuning parameters for the Metropolis-Hastings kernels in the move step 
of the algorithm. 
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Appendix: Algorithmic description of the SMC sampler method for 
GLMMs 

In this appendix we give a detailed description of how to use the SMC sampler 
method to perform inference in GLMMs. We use the notation of Section 2; 
choices made in the implementation of the algorithm are explained in Section 3. 

For any subset X of {1, . . . ,P} we write Cj for the submatrix of of the 
design matrix C consisting of columns in X, C_x for the submatrix consisting 
of columns of C not in X, vx and i/_x for the analogously defined subvectors 
of V. Also for any square matrix Q we write Qu for the square submatrix 
corresponding to rows and columns in X, Qi,-i for the submatrix with rows 
in X and columns not in X, Q_x,i for the submatrix with rows not in X and 
columns in X, and Q-i,-i for the square submatrix with rows and columns not 
in X. 
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Initialisation 

• Set the number of particles N and the number of intermediary distribu- 
tions S. 

• Construct a vector 7 with sth entry V'(s), s = 0,1,..., S*, where 'ip : 
{0,1,..., 5} [0,1] is an increasing function such that -0(0) = and 
ip{S) = 1. For the results in this paper we used ip{s) = min{l, s/{S — 5)}. 

• Construct subsets Ti, . . . ,Zj of {1, . . . , P} such that U/=i 2j = {li • • • i P}- 
The case J = 1 corresponds to no blocking of variables for the move step. 

• Set tuning parameters > 0, j ~ 1, . . . , J for the Metropolis-Hastings 
updates. Usually these will be set based on preliminary runs of the algo- 
rithm, and convenient defaults are Tj = 2.4/-\/|X,- 1. 

• Use the Breslow & Clayton (1993) penalised quasi-likelihood (PQL) algo- 
rithm to obtain initial estimates: 

PpQL and JpQL- 

This is facilitated by software such as glmmPQLO in the R package MASS 
[29]. Use these estimates in (8) to calculate S. 

• For each j ~ 1, . . . , J, calculate the conditional covariance under ttq of 
i/j conditional of If Q = S^^, then this conditional covariance is 



Initial sample from, ttq 

• Produce a sample of size N from ttq: for each i = 1, . . . ,N sample i/i from 
the normal distribution with mean Ppql and covariance S, then sample 
(T^ from the conditional inverse gamma distributions (9). 

• Set the weights m; = 1/A^ for each i = 1, . . . , N. 

Sequential sampling from each tts 
For each s = 1, . . . , in turn, 

Reweight For each i ~ 1, . . . , N, update Wi according to 

7r,(t/„a|) _ / 7r(z.„(7f) 

then normalise the weights by setting Wi <— Wi/'Y^^^^^ Wj- To avoid over- 
flow and underflow problems it is recommended that logarithms be used 
in this step. 

Resample Calculate the effective sample size (ESS) using 




N N 

1=1 1=1 
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If ESS < N/2 (or if s = min{s : 7^ = 1}) then rcsaniplc the particles. The 
naive version of resamphng, which introduces unnecessary Monte Carlo 
variation into the scheme, simply samples (with replacement) from the 
pool of particles, with particle i selected with probability Wi. However in 
our implementation we use stratified resampling [21] to reduce the Monte 
Carlo variation. After resampling set Wi = 1/N for alH = 1, . . . , A^. 
Move • For each j = 1, . . . , J and each i = 1, . . . , A^, generate proposals 
{i>^)I, ~ A^((i>0x,.,r/5]iJ, l<i<N. With probability 

a'-'-' = max <^ 1, ^ } 

accept the proposal and set (fi)xj = (i^i)ij- Otherwise reject the 
proposal and leave (t'i)x unchanged. Again, it is recommended that 
logarithms be used when calculating a to avoid overflow and under- 
flow problems. Note that several parts of the ratio in the calculation 
of a are the same in both the numerator and denominator and need 
not be calculated. 

• For each £ = 1, . . . ,L, and for each i = 1, . . . , A, sample (cf)^ 
from the inverse gamma distribution with shape A^i + qi/'i and rate 
Aui + ||u£|p/2. Note that if inverse gamma distributions are not used 
as the prior distribution for then sampling from inverse gamma 
distributions here would not result in a transition kernel that admits 
TTg as a stationary distribution. Instead further Metropolis-Hastings 
can be used for each cr^^ in turn. 

Note that the decision to resample on the first step at which 7^ = 1 means 
that the final sample is an unweighted sample from tt. Hence standard techniques 
for dealing with samples from posterior distributions can be used. However the 
plug-in rule for the bandwidth used in the density estimates performed poorly 
for resampling close to step S, since some particles were identical. This is the 
reason that we generally set 75-5 = 1 and finish with five applications of the 
transition kernel to the unweighted sample, resulting in a suitably diverse sample 
from TT. 
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